$f'(x)=-2x+2=0\Rightarrow x=1$
The condition that the derivative equals zero at the point is necessary for the point to be an extremum. Point with zero derivative may be an extremum, but doesn't have to be. Point without zero derivative can't be an extremum. We may call this the (simplified) First Order Necessary Condition (FONC).
if(!require(Deriv)) install.packages('Deriv')
library("Deriv")
f<-function(x){-x^2+2*x+4}
f_<-Deriv(f)
x<-seq(from=-5,to=5,by=0.1)
plot(x,f(x),type="l",col="blue",xlim<-c(-2,4),ylim<-c(-8,8),xlab='',ylab='')
lines(x,f_(x),type="l",col="red")
legend(-2,8, legend=c("f(x)=-x^2+2*x+4", "f'(x)=-2x+2"),col=c("blue", "red"), lty=1, cex=1.3)
grid()
if $f''(x)>0 \Rightarrow $ x is a local minimum point
if $f''(x)<0 \Rightarrow $ x is a local maximum point
if $f''(x)=0 \Rightarrow $ The test is inconclusive. One has to investigate further, by taking more derivatives,or getting more information about the graph.
Satisfying both conditions, that is zero first derivative and nonzero second derivative is sufficient to claim that the point is a local extremum of the function. Nonzero second derivative along with satisfied FONC may be then called the (simplified) Second Order Sufficient Condition (SOSC).
in our case:
$f''(x=1)=-2$, and the point is a local maximum.
$f'(x)=3x^2=0\Rightarrow x=0$
$f''(x=0)=6x=0$, and the test is inconclusive
The fact that the test is inconclusive, means that the point may be an extremum, but doesn't have to be. This means that satisfying the zero first derivative condition and zero second derivative condition doesn't necessarily mean that the point isn't an extremum. Zero first derivative and $f''(x)\leq0 \lor f''(x)\geq0$ may be then called the (simplified) Second Order Necessary Condition (SONC).
Moving back to the example, one may compute the concavity on the right and on the left of 0, and deduct that 0 is an inflection point. This is not the focus of our lecture.
f<-function(x){x^3}
f_<-Deriv(f)
x<-seq(from=-2,to=2,by=0.1)
plot(x,f(x),type="l",col="blue",xlab='',ylab='')
lines(x,f_(x),type="l",col="red")
legend(-2,8, legend=c("f(x)=x^3"),col="blue", lty=1, cex=1.5)
grid()
Let $f$ be defined on a set S in $R^n$ and let $x^*=(x_1^*,\dots ,x_n^*)$ be an interior point in S at which f has partial derivatives. A necessary condition for $x^*$ to be a maximum or minimum point for f is that $x^*$ is a stationary point for $f$ - that is, it satisfies the equations
Analogically to the case with one variable, we need to search for points in which the gradient is the zero vector. (This means that all of the partial derivatives need to equal zero):
$f_x'=-2x=0\Rightarrow x=0$
$f_y'=-4y=0\Rightarrow y=0$
The candidate solution is the point (0,0)
In the case of multiple variables, all combinations of second partial derivatives need to be checked. They are gathered in a matrix called the Hessian matrix.
$H=\begin{bmatrix}f''_{xx}\;\;f''_{xy} \\f''_{yx}\;\;\;\;f''_{yy}\end{bmatrix}= \begin{bmatrix}-2\;\;\;\;\;\;0 \\0\;\;\;\;-4\end{bmatrix}$
A symmetric matrix $A \in {\rm I\!R}^{n\times n}$ is called positive definite if $x^TAx>0$ for every $\boldsymbol 0\neq \boldsymbol x\in {\rm I\!R}^n$
A symmetric matrix $A \in {\rm I\!R}^{n\times n}$ is called negative definite if $x^TAx<0$ for every $\boldsymbol 0\neq \boldsymbol x\in {\rm I\!R}^n$
A symmetric matrix $A \in {\rm I\!R}^{n\times n}$ is called positive semidefinite if $x^TAx\geq0$ for every $\boldsymbol 0\neq \boldsymbol x\in {\rm I\!R}^n$
A symmetric matrix $A \in {\rm I\!R}^{n\times n}$ is called negative semidefinite if $x^TAx\leq0$ for every $\boldsymbol 0\neq \boldsymbol x\in {\rm I\!R}^n$
A symmetric matrix $A \in {\rm I\!R}^{n\times n}$ is called indefinite if there exists $\boldsymbol x$ and $\boldsymbol y \in {\rm I\!R}^n$ such that $x^TAx>0$ and $y^TAy<0$
The Second-Order Sufficient Condition (SOSC) for a minimum in an extremum is that the Hessian matrix is positive definite. Second Order Necessary Condition (SONC) for a minimum in an extremum is that the Hessian matrix is positive semidefinite.
The Second-Order Sufficient Condition for a maximum in an extremum is that the Hessian matrix is negative definite. Second Order Necessary Condition for a maximum in an extremum is that the Hessian matrix is negative semidefinite.
$\quad [z_1\;z_2]\begin{bmatrix}-2\;\;\;\;0 \\0\;-4\end{bmatrix}\begin{bmatrix}z_{1} \\z_{2}\end{bmatrix}=-2z_1^2-4z_2^2$
The Hessian is negative definite, and thus the point is the maximum.
Below is the code for the solution in R. The code needed to be adjusted (precisely the function had to be multiplied by -1) for the fact that solnp function minimizes, not maximizes the function.
library(Rsolnp)
fn <- function(x) {x[1]^2 + 2*x[2]^2}
x0 <- c(1, 1) # setup any init values
sol1 <- solnp(x0, fun = fn)
sol1[1]
sol1[2]
sol1[4]
sol1[5]
pars Optimal Parameters.
convergence Indicates whether the solver has converged (0) or not (1 or 2).
lagrange #The vector of Lagrange multipliers.
hessian The Hessian of the augmented problem at the optimal solution.
if(!require(plotly)) install.packages('plotly')
library("plotly")
x = seq(from=-1,to=1,by=0.1)
y = seq(from=-1,to=1,by=0.1)
z=outer(x,y,function(x,y)-x^2-2*y^2)
fig <- plot_ly(type = 'surface',x = x,y = y,z = z)
fig
We search for points in which the gradient is the zero vector.
$f_x'=2x=0\Rightarrow x=0$
$f_y'=-2y=0\Rightarrow y=0$
The candidate solution is the point (0,0)
$H=\begin{bmatrix}f''_{xx}\;\;f''_{xy} \\f''_{yx}\;\;\;\;f''_{yy}\end{bmatrix}= \begin{bmatrix}2\;\;\;\;\;\;0 \\0\;\;\;\;-2\end{bmatrix}$
$\quad [z_1\;z_2]\begin{bmatrix}2\;\;\;\;0 \\0\;-2\end{bmatrix}\begin{bmatrix}z_{1} \\z_{2}\end{bmatrix}=2z_1^2-2z_2^2$
The Hessian matrix is indefinite. Its concavity changes depending on the $z_1$ and $z_2$ values. The point is the saddle point.
library(Rsolnp)
fn <- function(x) {x[1]^2 - x[2]^2}
x0 <- c(1, 1) # setup any initial values
sol1 <- solnp(x0, fun = fn)
sol1[1]
sol1[2]
sol1[4]
sol1[5]
x = seq(from=-1,to=1,by=0.1)
y = seq(from=-1,to=1,by=0.1)
z=outer(x,y,function(x,y)x^2-y^2)
fig <- plot_ly(type = 'surface',x = x,y = y,z = z)
fig
A general optimization problem with equality constraints is of the form
max(min) $f(x_1,\dots,x_n)$ subject to $\begin{equation} \begin{cases} g_1(x_1,\dots,x_n)=b_1\\ \dots \\ g_m(x_1,\dots,x_n)=b_m\\\end{cases}\end{equation}$ $(m<n)$
In vector formulation, the problem is
max(min) $f(\boldsymbol x)$ subject to $g_j(\boldsymbol x)=b_j, j=1,\dots,m (m<n)$
The standard procedure for solving this problem is to:
$\mathcal{L}(\boldsymbol x)=f(\boldsymbol x)-\lambda_1g_1(\boldsymbol x)-\dots-\lambda_mg_m(\boldsymbol x)$, where the $\lambda's$ are called the Lagrange multipliers.
In our example the Lagrangian equals $\mathcal{L}(x,y)=x^2+y^2-\lambda_1(x^2+2y^2-1)$
$\frac{\partial\mathcal{L}(\boldsymbol x)}{\partial(x_i)} = \frac{\partial f(\boldsymbol x)}{\partial x_i}-\sum_{j=1}^{m}\lambda_j\frac{\partial g_j(\boldsymbol x)}{\partial x_i}=0,\;\;\;\;\; i=1,\dots,n$
This is equal to saying that we want to find the points at which the gradient of the Lagrangian equals the zero vector. Studying the equation we may conclude that the condition is satisfied at the points where the gradient of the function is a linear combination of gradients of constraints.
We also require that the gradients of the constraints are linearly independent at the analyzed points. This requirement is called constraint qualification.
Our First Order Necessary Conditions are:
A) $\frac{\partial\mathcal{L}}{\partial x}=2x-2\lambda_1x=0$
B) $\frac{\partial\mathcal{L}}{\partial y}=2y-4\lambda_1y=0$
C) $\frac{\partial\mathcal{L}}{\partial \lambda_1}=-x^2-2y^2+1=0 \Rightarrow x^2+2y^2-1=0$
As we can see, satisfying the FONC means finding the points at which partial derivatives are equal to zero and constraint is satisfied. Pointing out the C) condition seems redundant, as we could just use the constraint itself.
Solving the set of equations, we have:
A) $2x(1-\lambda_1)=0 \Rightarrow x=0 \lor \lambda_1=1$
B) $2y(1-2\lambda_1)=0 \Rightarrow y=0 \lor \lambda_1=\frac{1}{2}$
Above two equations create four cases that need to be analyzed:
1) $x=0 \land y=0 \Rightarrow $ constraint not satisfied
2) $x=0\land\lambda_1=\frac{1}{2}\Rightarrow2y^2=1\Rightarrow y=\frac{1}{\sqrt{2}}\lor y=-\frac{1}{\sqrt{2}}$
3) $y=0 \land \lambda_1=1 \Rightarrow x^2=1 \Rightarrow x=1 \lor x=-1 $
4) $ \lambda_1=\frac{1}{2} \land \lambda_1=1 \Rightarrow$ contradiction
Cases 2) and 3) result in four points that meet the FONC criteria, lets call them $P_1$,$P_2$,$P_3$ and $P_4$ respectively.
| Point | x | y | $\lambda_1$ | f(x,y) |
|---|---|---|---|---|
| $P_1$ | 0 | $\frac{\sqrt{2}}{2}$ | $\frac{1}{2}$ | $\frac{1}{2}$ |
| $P_2$ | 0 | <$\frac{-\sqrt{2}}{2}$ | $\frac{1}{2}$ | $\frac{1}{2}$ |
| $P_3$ | 1 | $0$ | $1$ | $1$ |
| $P_4$ | -1 | $0$ | $1$ | $1$ |
We introduce the idea of the Jacobian. Jacobian is a matrix of first derivatives, in the case of our example it is basically a gradient of our one active constraint.
$$ \mathbf{J} = \frac{d \mathbf{f}}{d \mathbf{x}} = \left[ \frac{\partial \mathbf{f}}{\partial x_1} \cdots \frac{\partial \mathbf{f}}{\partial x_n} \right] = \left[ \frac{\partial \mathbf{g}}{\partial x} \frac{\partial \mathbf{g}}{\partial y} \right] = \left[ 2x \; 4y \right] $$Once again we need to check the definiteness of the Hessian matrix (precisely the definiteness of the Hessian of the Lagrangian) in the nullspace of the Jacobian. The vectors "surrounding" the Hessian need to be the nullspaces of the corresponding Jacobian. How to calculate the nullspace?
$J(\boldsymbol x^*)*Z=0$
$\cdot$ Case $P_1$: $[0\;\;2\sqrt{2}]\begin{bmatrix}z_{1} \\z_{2}\end{bmatrix}=0\Rightarrow 2\sqrt{2}z_2=0\Rightarrow z_2=0\Rightarrow Z=\begin{bmatrix}z_{1} \\ 0\end{bmatrix}$
$\cdot$ Case $P_2$: $[0\;\;-2\sqrt{2}]\begin{bmatrix}z_{1} \\z_{2}\end{bmatrix}=0\Rightarrow -2\sqrt{2}z_2=0\Rightarrow z_2=0\Rightarrow Z=\begin{bmatrix}z_{1} \\ 0\end{bmatrix}$
$\cdot$ Case $P_3$: $[2\;\;\;0]\begin{bmatrix}z_{1} \\z_{2}\end{bmatrix}=0\Rightarrow 2z_1=0\Rightarrow z_1=0\Rightarrow Z=\begin{bmatrix}0 \\ z_2\end{bmatrix}$
$\cdot$ Case $P_4$: $[-2\;\;\;0]\begin{bmatrix}z_{1} \\z_{2}\end{bmatrix}=0\Rightarrow -2z_1=0\Rightarrow z_1=0\Rightarrow Z=\begin{bmatrix}0 \\ z_2\end{bmatrix}$
We calculated the nullspace vectors of each candidate point. Now, how does the process of calculating the definiteness of Hessian matrix look in the case of optimization with equality constraints? It is very similar to previous cases, but now the Hessian is based on our Lagrangian function.
$H=\begin{bmatrix}f''_{xx}\;\;f''_{xy} \\f''_{yx}\;\;\;\;f''_{yy}\end{bmatrix}\;\;\;$ of $\;\;\;\mathcal{L}(x,y)=x^2+y^2-\lambda_1(x^2+2y^2-1)$
$H_L=\begin{bmatrix}2-2\lambda_1\;\;\;\;\;0 \\0\;\;\;\;2-4\lambda_1\end{bmatrix}$ The $H_L$ may differ between points, as $\lambda_1$ differ.
Now for each point calculate $Z^T*H_L*Z$. We can see that the nullspace $Z$ of the two points $P_1$ and $P_2$ is indifferent, similarly as their $\lambda_1$ values. This means that the definiteness of their Hessian matrices will be equal. Judging by the same conditions, the definiteness of Hessian matrices of points $P_3$ and $P_4$ is also the same. Therefore:
$\cdot$ Case $P_1$ and $P_2$:
$\quad [z_1\;\;0]\begin{bmatrix}1\;\;\;\;0 \\0\;\;\;\;0\end{bmatrix}\begin{bmatrix}z_{1} \\0\end{bmatrix}=z_1^2 \Rightarrow$ The Hessian is positive definite, the $P_1$ and $P_2$ points are the minima.
$\cdot$ Case $P_2$ and $P_3$:
$\quad [0\;\;z_2]\begin{bmatrix}0\;\;\;\;0 \\0\;-2\end{bmatrix}\begin{bmatrix}0 \\z_{2}\end{bmatrix}=-2z_2^2 \Rightarrow$ The Hessian is negative definite, the $P_3$ and $P_4$ points are the maxima.
Maximize $f(x,y)=x^2+2y$ subject to $ \begin{equation} \begin{cases} g_1=x^2+y^2-5\leq0\\ g_2=-y\leq0\\ \end{cases} \end{equation}$
$L=x^2+2y-\lambda_1(x^2+y^2-5)-\lambda_2(-y)$
A)$\frac{\delta_L}{\delta_x}=2x-2\lambda_1x=0\Rightarrow2x(1-\lambda_1)=0\Rightarrow x=0 \lor \lambda_1=1$
B)$\frac{\delta L}{\delta y}=2-2\lambda_1y+\lambda_2=0$
C)$\lambda_1\geq0\;(\lambda_1=0\;\;if\;\;x^2+y^2-5<0)$
D)$\lambda_2\geq0\;(\lambda_2=0\;\;if\;\;y>0)$
Above conditions are the Karush-Kuhn-Tucker, or KKT conditions. The additional C) and D) conditions are called complementary slackness conditions.
$\cdot$ C) and D) are binding $\Rightarrow x^2+y^2=5,y=0$
$\quad$ case A)$\;x=0 \Rightarrow x^2+y^2\neq5$ contradiction
$\quad$ case A)$\;\lambda_1=1\Rightarrow \lambda_2=-2$ contradiction
$\cdot$ C) is active, D) is inactive $\Rightarrow x^2+y^2=5,\lambda_2=0,y>0$
$\quad$ case A)$\;x=0\Rightarrow y=\pm\sqrt5$
$\quad \quad$ if $y=\sqrt5 \Rightarrow \lambda_1=1\sqrt5$ point satisfies the constraints
$\quad \quad$ if $y=-\sqrt5 \Rightarrow$ $y<0$, contradiction
$\quad$ case A) $\lambda_1=1\Rightarrow y=1,x=\pm2$ both points satisfy the constraints
$\cdot$ C) is inactive, D) is active $\Rightarrow \lambda_1=0,x^2+y^2-5<0,y=0,\lambda_2\geq0$
$\quad\lambda_2=-2$ contradiction
$\cdot$C) and D) are inactive $\Rightarrow \lambda_1=0,x^2+y^2-5<0,\lambda_2=0,y>0$
$\quad 2=0$ contradiction
| No. | x | y | $\lambda_1$ | $\lambda_2$ | f(x,y) |
|---|---|---|---|---|---|
| 1 | 2 | 1 | 1 | 0 | 6 |
| 2 | -2 | 1 | 1 | 0 | 6 |
| 3 | 0 | $\sqrt5$ | $\frac{1}{\sqrt5}$ | 0 | $2\sqrt5$ |